195 research outputs found

    Molecular Evolution and Gene Function

    Get PDF
    One of the basic questions of phylogenomics is how gene function evolves, whether among species or inside gene families. In this chapter, we provide a brief overview of the problems associated with defining gene function in a manner which allows comparisons which are both large scale and evolutionarily relevant. The main source of functional data, despite its limitations, is transcrip-tomics. Functional data provides information on evolutionary mechanisms primarily by showing which functional classes of genes evolve under stronger or weaker purifying or adaptive selection, and on which classes of mutations (e.g., substitutions or duplications). However, the example of the "ortholog conjecture" shows that we are still not at a point where we can confidently study phylogenomically the evolution of gene function at a precise scale

    Detecting patterns of species diversification in the presence of both rate shifts and mass extinctions

    Get PDF
    Recent methodological advances are enabling better examination of speciation and extinction processes and patterns. A major open question is the origin of large discrepancies in species number between groups of the same age. Existing frameworks to model this diversity either focus on changes between lineages, neglecting global effects such as mass extinctions, or focus on changes over time which would affect all lineages. Yet it seems probable that both lineages differences and mass extinctions affect the same groups. Here we used simulations to test the performance of two widely used methods, under complex scenarios. We report good performances, although with a tendency to over-predict events when increasing the complexity of the scenario. Overall, we find that lineage shifts are better detected than mass extinctions. This work has significance for assessing the methods currently used for estimating changes in diversification using phylogenies and developing new tests.Comment: 34 pages, 11 figure

    Generating Homology Relationships by Alignment of Anatomical Ontologies

    Get PDF
    The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we aim to establish homology relations between ontologies describing different species. We present a new algorithm, and its implementation in the software Homolonto, to create new relationships between anatomical ontologies, based on the homology concept. These relationships and the Homolonto software are available at "http://bgee.unil.ch/.":http://bgee.unil.ch

    IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics

    Get PDF
    Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability. Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments. Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Meta-analysis of estrogen response in MCF-7 distinguishes early target genes involved in signaling and cell proliferation from later target genes involved in cell cycle and DNA repair

    Get PDF
    ABSTRACT: BACKGROUND: Many studies have been published outlining the global effects of 17 beta-estradiol (E2) on gene expression in human epithelial breast cancer derived MCF-7 cells. These studies show large variation in results, reporting between ~100 and ~1500 genes regulated by E2, with poor overlap. RESULTS: We performed a meta-analysis of these expression studies, using the Rank product method to obtain a more accurate and stable list of the differentially expressed genes, and of pathways regulated by E2. We analyzed 9 time-series data sets, concentrating on response at 3-4 hrs (early) and at 24 hrs (late). We found >1000 statistically significant probe sets after correction for multiple testing at 3-4 hrs, and >2000 significant probe sets at 24 hrs. Differentially expressed genes were examined by pathway analysis. This revealed 15 early response pathways, mostly related to cell signaling and proliferation, and 20 late response pathways, mostly related to breast cancer, cell division, DNA repair and recombination. CONCLUSIONS: Our results show that meta-analysis identified more differentially expressed genes than the individual studies, and that these genes act together in networks. These results provide new insight into E2 regulated mechanisms, especially in the context of breast cancer

    Methods detecting rhythmic gene expression are biologically relevant only for strong signal

    Get PDF
    Author summary To be active, genes have to be transcribed to RNA. For some genes, the transcription rate follows a circadian rhythm with a periodicity of approximately 24 hours; we call these genes “rhythmic”. In this study, we compared methods designed to detect rhythmic genes in gene expression data. The data are measures of the number of RNA molecules for each gene, given at several time-points, usually spaced 2 to 4 hours, over one or several periods of 24 hours. There are many such methods, but it is not known which ones work best to detect genes whose rhythmic expression is biologically functional. We compared these methods using a reference group of evolutionarily conserved rhythmic genes. We compared data from baboon, mouse, rat, zebrafish, fly, and mosquitoes. Surprisingly, no method was particularly effective. Furthermore, we found that only very strong rhythmic signals were relevant with each method. More precisely, when we use a usual cut-off to define rhythmic genes, the group of genes considered as rhythmic contains many genes whose rhythmicity cannot be confirmed to be biologically relevant. We also show that rhythmic genes mainly contain highly expressed genes. Finally, based on our results, we provide recommendations on which methods to use and how, and suggestions for future experimental designs

    Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human

    Get PDF
    Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements. Results: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    The expansion of amino-acid repeats is not associated to adaptive evolution in mammalian genes

    Get PDF
    BACKGROUND: The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors. RESULTS: Mammalian positively selected genes have accumulated more recent amino acid repeats than other mammalian genes. However, we found little support for an accelerated evolutionary rate as the main driver for the expansion of amino acid repeats. The most significant predictors of amino acid repeats are gene function and GC content. There is no correlation with expression level. CONCLUSIONS: Our analyses show that amino acid repeat expansions are causally independent from protein adaptive evolution in mammalian genomes. Relaxed purifying selection or positive selection do not associate with more or more recent amino acid repeats. Their occurrence is slightly favoured by the sequence context but mainly determined by the molecular function of the gene

    Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human

    Get PDF
    Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements
    corecore